Fix: coerce unknown types to O dtype#10339
Conversation
for more information, see https://pre-commit.ci
|
do you think we could add a quick comment in the code describing what it's doing, for the less informed of us? (does anyone know this better?) |
|
Sure, done. Another options for this would be to check all the args passed to the |
|
@keewis is there any timeline for this to be reviewed? We're currently stuck with xarray v2024.2.0 as a dependency until this is resolved which is starting to cause issues when used in some working environments. |
There was a problem hiding this comment.
sorry, this had completely dropped off my radar.
If this is about scalars, note that you can already wrap your custom object in a 0D array of object dtype to work around this (so you don't have to wait on us)
@shoyer, do you have an opinion on this?
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
yeah, I thought I'd help you fix the untested parts of what I suggested |
|
@keewis is looks like this one is good to merge. Right? Just needs a note in "what's new" |
|
@jsignell this may have been superseded by #10423? It certainly seems to be at odds with #10423 since a bunch of the extension array tests are now failing. Let me see if removing the implementation here introduced here but keeping the tests (and checking against the MWE I raised in the issue) works as expected. EDIT: the answer to that is no, it doesn't work as expected. The implementation in #10423 means that it won't promote arbitrary python objects to The simplest option would be to use a try/except in |
|
the new tests pass with the code on |
|
@keewis I've removed the two methods previously added in this PR, in favour of the TypeError fallback. I've then removed |
shoyer
left a comment
There was a problem hiding this comment.
Sorry for neglecting this one for so long!
| builtin_types = ( | ||
| bool, | ||
| int, | ||
| float, | ||
| complex, | ||
| str, | ||
| bytes, | ||
| dt.datetime, | ||
| dt.timedelta, | ||
| ) |
There was a problem hiding this comment.
It does not look like this is used in the current version of this PR
| except TypeError: | ||
| return np.dtype(object) |
There was a problem hiding this comment.
This entire module is intended as forwards compatible replacements for array API functionality, so we shouldn't add behavior here that deviates from result_type in the array API. I would much rather prefer that this changed behavior be done in Xarray's specific dtypes.result_type(). (Also, it should have a brief comment explaining why TypeError mean object dtype is appropriate.)
| def maybe_coerce_to_str(index, original_coords): | ||
| """maybe coerce a pandas Index back to a numpy array of type str | ||
|
|
||
| pd.Index uses object-dtype to store str - try to avoid this for coords | ||
| """ | ||
| from xarray.core import dtypes | ||
|
|
||
| try: | ||
| result_type = dtypes.result_type(*original_coords) | ||
| except (TypeError, ValueError): | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Does this change also need test coverage? I don't see how results.result_type() would cover it.
Added a check for Python objects that neither have numpy dtype
dtypeattributes nor are one of the builtin types so that type promotion succeeds and defaults toobjectdtype.This deals with cases where user-defined instances are in the array which would otherwise trip up
numpy.result_type.I did think about placing this in
xarray.core.dtypes.preprocess_typesbut chose to place it closest to where it causes issues (namely, on callingxp.result_type).sumto custom object arrays withmin_count=1if n_dims <= sum_dims #9755whats-new.rstapi.rst